\section{Proof of Lemma~\ref{lem:complexity of DISJ}}\label{sec:proof of DISJ}

Our proof adapts the proof of Razborov \cite{Razborov92} from the situation of a fixed partition to a random partition of the inputs. In the following we try to keep our notation close to that of Razborov's. In particular, we use $N$ to replace $b$ as elsewhere in this paper.
%
We first recall the standard definition of deterministic and randomized protocols here.
%
\begin{definition}
The communication cost of a deterministic protocol $P$ computing a function $f:{\cal X}\times{\cal Y}\to\{0,1\}$ on an input $x,y$ is the number of bits exchanged between Alice and Bob during $P$ on $x,y$. The communication cost of $P$ is the maximum communication cost over all inputs. The deterministic communication complexity of $f$ is the minimum communication cost (over all protocols) computing $f$.

A randomized protocol has to be correct with probability $2/3$ (over the random choices used in the protocol) on every input. The randomized communication complexity is then defined as above.
\end{definition}
%
In the above definition, the inputs $x,y$ are partitioned in a fixed way. We are interested in the communication cost over an average partition of the inputs to Alice and Bob. In this situation the communication cost (for a fixed input $z$) is the average over all partitions of the bits of $z$ to Alice and Bob. More precisely, for every partition $\rho$ of $z$ there is a protocol $P_\rho$ computing $f$, and the communication cost on $z$ is the average over the communication costs of all the $P_\rho$ on $z$. Here the distribution over partitions is the one in which every bit $z_j$ is assigned uniformly random to Alice or Bob (but not both).

\begin{definition}
A {\em deterministic average partition} protocol $P$ consists of a protocol $P_\rho$ for every partition of the inputs $z=z_1,\ldots, z_\ell$. Every protocol $P_\rho$ has to be correct on all inputs partitioned according to $\rho$.

The average partition communication cost of a deterministic protocol $P$ on an input $z$ is the average number of bits exchanged by $P_\rho$, where the distribution on $\rho$ is defined such that each $z_j$ is assigned as input to Alice or Bob uniformly at random. The average partition communication cost of $P$ is then the maximum over all inputs.

The deterministic average partition communication complexity of a function $f$ is the minimum average partition communication cost over all protocols for $f$.

For randomized protocols the following correctness criteria must hold: for an input $z$ the probability of $P$ giving the correct output must be at least $2/3$, over both the random choices inside the $P_\rho$ and the choice of $P_\rho$ itself.\danupon{Should I change ``average partition'' to ``random partition''.}
\end{definition}


\danupon{I added this para. Please check.} We note that in the definition above, the notion of average partition is slightly different from the notion of random-partition we define in \Cref{sec:communication complexity} in that we are interested in {\em expected} communication complexity for the case of average partition, and {\em worst-case} communication complexity for the case of random partition. Since a protocol with worst-case communication complexity $c$ also gives an average communication complexity $c$, a lower bound for the average partition model is also a lower bound for the random partition model. In this section, we will prove a lower bound for the average partition model.


We will show a lower bound on the randomized average partition communication complexity of the Disjointness problem $\DISJ$.
This complexity measure is defined as a maximum over all inputs (of the average cost of randomized protocols under a random partition). As usual we can instead (as in the easy direction of the Yao principle) bound the expected complexity of {\em deterministic protocols} over a hard distribution on inputs (and over the random partitions).\danupon{I'm not very sure what the last sentence means.}

Recall that the partitions $\rho$ of $\{z_1,\ldots, z_\ell\}$ are chosen uniformly. Furthermore, inputs $z$ will be chosen from a distribution $\mu$ defined by Razborov. It is important that inputs and their partition are chosen independent of each other. Due to independence, when analyzing the error of a protocol, we may also choose the partition first, and then consider the error for random inputs given a fixed partition.


\begin{theorem} \label{thm:disj}
The randomized average partition communication complexity of the Disjointness problem $\DISJ$ is $\Omega(N)$.
\end{theorem}

%The rest of this section is devoted to proving the above theorem.
The theorem also holds, if Alice also gets to know all of $x$ and Bob gets to know all of $y$.

\subsection*{Proof of Theorem~\ref{thm:disj}}

The inputs to Disjointness are $z=x_1,\ldots, x_N, y_1,\ldots, y_N$.
We first take a look at a simple property that most partitions have, namely that on most partitions many pairs $x_j,y_j$ are distributed among Alice and Bob. For simplicity we only care about pairs where $x_j$ is with Alice and $y_j$ is with Bob.

After that we follow Razborov's proof for the hardness of Disjointness under a fixed distribution $\mu$, but we have to be more flexible with parameters, and restrict attention to a part of the distribution $\mu$ that is still hard (for a given partition of the inputs). Furthermore we have to analyze certain parts of rectangles separately to keep the product properties in Razborov's proof intact.

In the following $\delta$ will be a small all-purpose constant.
Denote the expected error of the protocol by $\epsilon$. This error is the expectation over inputs $z$ (from $\mu$) and over partitions $\rho$ of the input positions $\{1,\ldots, 2N\}$. We denote the expected error under a fixed partition $\rho$, i.e., the expected error of $P_\rho$, by $\epsilon_\rho$. The probability (over $\rho$) that $\epsilon_\rho$ is larger than $\epsilon/\delta$ is at most $\delta$.

Besides the error also the communication cost on input $z$ is an expectation, this time over the choice of the partition. Recall that we are dealing with deterministic protocols $P_\rho$. Denote by $c$ the expected communication over the choice of $\rho$ and the choice of $z$, and by $c_\rho$ the expectation over $z$ with $\rho$ fixed.

In our main lemma we will show that for most partitions $\rho$ this is impossible unless $c=\Omega(N)$.

We first define the input distribution used by Razborov for $\DISJ$ on inputs of size $N$.

\begin{definition}
Set $N=4m-1$, the size of the universe of elements.
To choose inputs according to Razborov's distribution one first chooses a {\it frame}. The frame consists of a partition of $\{1,\ldots, N\}$ into 3 disjoint subsets $z_x, z_y, \{i\}$ such that $|z_x|=|z_y|=2m-1$.

For the distribution $\mu_1$ on 1-inputs to $\DISJ$ one then chooses a subset $x$ of size $m$ of $z_x$ as an input to Alice, and a subset of size $m$ of $z_y$ as an input to Bob.

For the distribution $\mu_0$ on 0-inputs one chooses a subset $x$ of size $m-1$ from $z_x$ and lets Alice's input be $x\cup\{i\}$, and correspondingly for Bob.

The distribution $\mu$ is the mixture $3/4\cdot \mu_0+1/4\cdot\mu_1$.
\end{definition}

Note that $\mu$ is the uniform distribution on all inputs for which both sets are of size $m$ and the intersection size is at most 1.

In our setting the partition $\rho$ of input positions (not to be confused with the partitions from the definition of $\mu$) is chosen independent of the inputs (which are chosen from $\mu$). We next observe that usually enough inputs $x_i,y_i$ are distributed among Alice and Bob.


\begin{lemma}
Let $\rho$ be a uniformly random partition of the input variables $ x_1,\ldots, x_N,$ $y_1,\ldots, y_N$.
Then with probability $1-\delta$ we have that at least $N/4-\sqrt{N/\delta}$ pairs $x_j,y_j$ are distributed such that Alice holds $x_i$ and Bob holds $y_i$.
\end{lemma}

{\sc Proof:} For each pair $x_i$ is on Alice's side and $y_i$ on Bob's side with probability $1/4$. Let $C$ denote number of $i$'s for which this is the case. The expectation of $C$ is $N/4$ and the variance is $3N/16$. Hence, by Chebyshev's inequality the probability that $|C-N/4|$ is larger than $(1/\sqrt\delta) (\sqrt 3/4) \sqrt N$ is at most $\delta$. $\hfill\Box$

We call partitions $\rho$ under which more than $n=N/5$ positions $j$ are such that $x_j$ is with Alice and $y_j$ with Bob {\em good partitions}. In the following we analyze protocols under an arbitrary fixed good partition, showing that protocols with low communication cost will have large error. Since the expected communication is at least $1-\delta$ times the cost under the best good partition, a linear lower bound will follow for a random partition. On the other hand, at most a $\delta$ fraction of all partitions is allowed to have larger error than $\epsilon/\delta$.

So fix any good partition $\rho$. For notational simplicity assume that $x_1,\ldots, x_n$ belong to Alice, and $y_1,\ldots, y_n$ to Bob. The remaining inputs are distributed in any other way. We consider inputs drawn from the distribution $\mu$.

Note first, that the probability (under $\mu$) that for the "intersection position" $i$ is larger than $n$, i.e., that both $x_i$ and $y_i$ might belong to Alice (or Bob) is $4/5$, and in this case we assume the protocol already knows the answer without communication. So we have to assume that $\epsilon< 1/20$ to get a lower bound, since $i$ is contained in both $x$ and $y$ with probability $1/4$, and hence with error $1/20$ the problem to be solved is trivial. If the error is smaller than, say $1/50$, then the protocol must give the correct output for most inputs with the intersecting position $i$ smaller than $n+1$.

Define $\mu_\rho$ as the distribution $\mu$ restricted to $i$ (as part of the frame) being chosen from $1$ to $n$. Note that $\mu=1/5\cdot\mu_\rho+4/5\cdot \sigma$ for some distribution $\sigma$. Since the protocols $P_\rho$ we consider have error $\epsilon_\rho$ on $\mu$ (using partition $\rho$), they can have error at most $5\epsilon_\rho$ on $\mu_\rho$. So we restrict our attention to $\mu_\rho$.

$\mu_{\rho,1}$ is the distribution $\mu_1$ restricted to $i$ being from $1$ to $n$, and $\mu_{\rho,0}$ is the distribution $\mu_0$ under the same restriction. We will denote the random variables of inputs chosen from $\mu_{\rho,1}$ by ${\bf x_0},{\bf y_0}$ (indicating intersection size 0), and the random variables of inputs chosen from $\mu_{\rho,0}$ by ${\bf x_1},{\bf y_1}$. ${\bf x,y}$ is the pair of random variables drawn from $\mu$. Note that $\bf x$ is the random variable corresponding to the marginal of $\mu$ on the $x$-variables.


 Furthermore  we will denote by ${\bf a},{\bf b}$ etc.~the random variables ${\bf x},{\bf y}$
grouped into Alice's and Bob's inputs. Note that the first $n$ variables in $\bf x$ belong to Alice and the first $n$ variables in $\bf y$ belong to Bob, the other variables are assigned in any way. Subscripts again indicate drawing inputs from $\mu_{\rho,0}$ or $\mu_{\rho,1}$.


We can now state the Main Lemma.


\begin{lemma}
Let $\rho$ be a good partition. ${\bf x},{\bf y}$ are chosen from $\mu_\rho$ (i.e., $\mu$ with $i<n+1$), and partitioned into random variables as described above.
Let $R=\bar{A}\times\bar{B}$ denote a rectangle of size $2^{-\zeta N}$ (under $\mu_\rho$) in the communication matrix induced by $\DISJ$ and $\rho$, for some small constant $\zeta>0$.

Then \[\prob\left(({\bf a_1},{\bf a_1})\in R\right)\geq\alpha\cdot\prob\left(({\bf a_0},{\bf a_0})\in R\right), \]
for some constant $\alpha>0$.
\end{lemma}

In other words, all rectangles $R$ that are trying to cover mostly 1-inputs to $\DISJ$ are either small or corrupted by a constant fraction of 0-inputs. A low error cover made up of rectangles hence needs to contain many rectangles.

Now we show that the main lemma implies the lower bound for $\DISJ$.
The randomized average partition communication complexity is lower bounded by $E_\rho E_\mu c_{\rho,z}(f)$, where $c_{\rho, z}$ is the communication cost on input $z$ under partition $\rho$. Since all good partitions together have probability $1-\delta$, we just need to show that for any good partition $\rho$ the average communication $E_\mu c_{\rho,z}\geq\Omega(N)$.

We have $\prob_{\rho,\mu}(c_{\rho,z}>\delta c)<\delta$. Hence, if we stop all protocols $P_\rho$ once their communication on an input $z$ exceeds $c/\delta$, we increase the error by at most $\delta$. Hence in the following we can  assume that for each $\rho$ and $z$ we have $c_{\rho,z}\leq c/\delta$, and in particular, that $c_\rho\leq c/\delta$.

The protocol $P_\rho$ has communication at most $c/\delta$, and corresponds to a partition of the communication matrix according to $\rho$ into $2^{c/\delta}$ rectangles with total error $\epsilon_\rho+\delta$ under $\mu$.
Since $\mu_\rho$ makes up one fifth of $\mu$, the error under $\mu_\rho$ can be at most $5(\epsilon_\rho+\delta)$. Furthermore, the inputs in the support of $\mu_\rho$ (i.e., the inputs with intersection size at most $1$ and intersections happening at position $n$ or less) are partitioned into $2^{c/\delta}$ rectangles with error $10(\epsilon_\rho+\delta)$. Since $\mu_\rho$ puts weight $3/4$ on inputs $x,y$ that are disjoint, there must be a single rectangle $R$ that covers $(3/12)\cdot2^{-c/\delta}$ of 1-inputs and at the same time has error at most $30(\epsilon/\delta+\delta)$ (both under $\mu_\rho$). The Main Lemma now leads to a contradiction for $\zeta N<c/\delta$ and $30(\epsilon/\delta+\delta)$ being a suitably small constant.



We now proceed to show the main lemma.
In Razborov's proof the rectangles $R$ to be analyzed are product sets with respect to the divide between $x$ and
$y$. For us this is not the case, because Alice and Bob hold both $x$- and $y$-inputs. However,
Alice holds $x_1\ldots,x_n$ and Bob $y_1,\ldots, y_n$. Call all the remaining variables (that are distributed arbitrarily) $O$.

In the statement of the Main Lemma $\alpha/(1+\alpha)$ is (a lower bound on) the error of $R$ under $\mu_\rho$.
Since $\alpha$ is a constant we will determine later it is enough to show a constant lower bound on the error of any large rectangle $R$. Denote the error of $R$ under $\mu_\rho$ by $\alpha'$.

The rectangle $R$ can be partitioned into smaller rectangles by fixing the variables $O$ to some value $o$. We denote a resulting rectangle by $R^o$ (note that $R^o$ is indeed a rectangle). Let $\gamma_o$ denote the average error of $R^o$ under $\mu_\rho$, normalized to $R^o$ (where $o$ is chosen according to the size of $R^o$). Clearly the expected $\gamma_o$ is $\alpha'$.

$R^o=\bar{A}^o\times \bar{B}^o$ corresponds to a rectangle in $\{0,1\}^n\times\{0,1\}^n$ by ignoring fixed input variables.
 $R$ itself is a rectangle in $\{0,1\}^N\times\{0,1\}^N$.
 $o$ fixes $x,y$ on the last $N-n$ input positions, and if we define $a$ as $x\cap\{1,\ldots,n\}$ and $b=y\cap\{1,\ldots, n\}$, then $m_a=|a|$ and $m_b=|b|$ are fixed on $R^o$.

 For every $a,b$ with fixed $m_a,m_b$
there are
\[f_{a,b}={N-n\choose m-m_a}\cdot{N-n-m+m_a\choose m_b}\]
 extensions $o$. And hence
as many (disjoint) rectangles $R^o$ in $R$. Later we will restrict our attention to $o$ for which $m_a,m_b$ are very close to $n/4$. Recall that $R$ has size $\mu_\rho(R)\geq 2^{-\zeta N}$ by the assumptions in the Main Lemma.

  To apply a modification of Razborov's original argument for $\DISJ$ on $\{0,1\}^n\times\{0,1\}^n$ we want to find an $o$ such that $R^o$
  \begin{enumerate}
  \item is large (as a rectangle in $\{0,1\}^n\times\{0,1\}^n$)
  \item has small error
  \item the size of $x\cap\{1,\ldots n\}$ and $y\cap\{1,\ldots, n\}$ is roughly $n/4$ for all inputs in $R^o$.
  \end{enumerate}

  The rectangles $R^o$ with $\mu_\rho(R^0)<f_{a,b}^{-1}\cdot2^{-2\zeta N}$ contribute at most $2^{-2\zeta N}$ to $R$ and can thus be disregarded without changing the error much (even if the small $R^o$ contain only 1-inputs).
  We choose $o$ now by picking $x,y$ from $\mu_\rho$  restricted to $R$, and removing $x_1,\ldots, x_n,y_1,\ldots, y_n$. This clearly chooses a $R^o$ by size, and hence the expected error is at most $2\alpha'$, and the probability that the error is larger than $2\alpha'/\delta$ is at most $\delta$.


  We now have to consider the size of sets $a=x\cap\{1,\ldots, n\}$ and $b=y\cap\{1,\ldots, n\}$. Note that $o$ fixes some $k_x$ of the $x$-variables to $1$ and some $k_y$ of the $y$-variables, and the size of $a$ is $m-k_x$, the size of $b$ is $m-k_y$.


The following claim follows from a simple application of the Chernoff inequality.
\begin{claim}
Choose $o$ by choosing $x,y$ from $\mu_\rho$ and discarding the variables outside $O$.
\[\prob(|\ |a|-n/4\ |>\kappa n\mbox{ or } |\ |b|-n/4\ |>\kappa n)\leq 2^{-\kappa^2 n/3}.\]
\end{claim}

For a small enough $\kappa$ (namely $\kappa^2/3\cdot n\ll \zeta N$) this means that when we choose $o$ as above (i.e., $x,y$ from $R$ according to $\mu$ and dropping the unneeded variables), then with probability $>2/3$ the size of $a$ and $b$ are within $\pm\kappa n$ of $n/4$.

Hence we can find $o$ such that the above 3 criteria are satisfied, and all that is left to do is to show that $R_o$ must have large error after all. By now we are almost in the same situation as Razborov, except that the sets $a,b$ are not exactly of size $n/4$.

  Denote by $m_a$ and $m_b$ the sizes of $a,b$ respectively. We consider the distribution $\nu$ which is defined as follows.




  \begin{definition}
$n=4m'-1$. To choose inputs one first chooses a {\it frame}. There are $n$ input positions. The frame consists of a partition of $\{1,\ldots, n\}$ into 3 disjoint subsets $z_a, z_b, \{i\}$ such that $|z_a|=|z_b|=2m'-1$.

For the distribution $\nu_1$ on 1-inputs to $\DISJ$ one then chooses a subset $a$ of size $m_a$ of $z_a$ as an input to Alice, and a subset of size $m_b$ of $z_b$ as an input to Bob.

For the distribution $\mu_0$ on 0-inputs one chooses a subset $a$ of size $m_a-1$ from $z_a$ and lets Alice's input be $a\cup\{i\}$, and correspondingly for Bob.

The distribution $\nu$ is the mixture $3/4\cdot \nu_0+1/4\cdot\nu_1$.
\end{definition}


Note that $\nu$ can be obtained from $\mu_\rho$ by dropping the last $N-n$ elements of the universe.
Since $\mu$ is invariant under permuting the last $N-n$ or the first $n$ elements of the universe, for all $x,y$ and $a,b$ obtained from them
$\nu(a,b)=\mu_\rho(x,y)\cdot f_{a,b}$. In our case all $f_{a,b}$ are within $2^{O(\kappa n)}$ of each other.
This means that our $R^o$ has size $\nu(R^o)\geq 2^{-2\zeta N-O(\kappa N)}$.






What is left to do is to prove Razborov's Main Lemma under the modification that the sets $x,y$ have fixed sizes that are slightly different from $n/4$. The argument is almost entirely the same, with some estimates looser than in the original.

We now restate that Lemma. In this restatement we re-use the notation $\bf a_1$ etc., which now refer to the domain $\{0,1\}^n\times\{0,1\}^n$ and $\nu$ etc.


\begin{lemma}
$a,b$ are chosen from $\nu$ (as above).
Let $R=\bar{A}\times\bar{B}$ denote a rectangle of size $2^{-\zeta' }$, for some small constant $\zeta'>0$.

Then \[\prob\left(({\bf a_1},{\bf a_1})\in R\right)\geq\alpha\cdot\prob\left(({\bf a_0},{\bf a_0})\in R\right), \]
for some constant $\alpha>0$.
\end{lemma}

We omit the details of this. As discussed above this implies our Main Lemma, and hence the lower bound for $\DISJ$.

To see that the above proof also applies if Alice knows all of $x$ and Bob knows all of $y$ note that we actually analyze the rectangles $R^0$, in which the extra inputs are fixed anyway.
\endinput
